- A buzzword I don't really like, but sadly applies to my work.
- ML Ops Overview, definition & architecture
- "Examine how ML processes can be automated & operationalized"
- Methodology
- Literature survery
- Interviews
- Principles = Best Practices
- CI/CD automation – fast feedback for build, test, delivery & deploy
- Workflow orchestration
- Reproducibility – same results
- Versioning – data, model, code for reproduction and tracing
- Collaboration – on data, model and code
- Continuous ML training & evaluation --
- monitoring
- feedback loop
- automated ML workflow pipeline
- + eval run to check for changes in model quality
- ML Metadata tracking/logging – full traceability
- Continuous Monitoring – periodic assessment of data, model, code, infra, model perf
- Feedback loops – eval -> engineering, monitoring -> scheduler, etc.
- Components
- CI/CD automation
- Source code – code storing
- Workflow Orchestration – DAGs
- Feature Store – offline, online
- Model Training Infrastructure
- Model registry – trained models + metadata
- ML Metadata store
- Model serving component
- Monitoring component – includes tensorboard
- People – not so clean
- Business stakeholder
- Solution "architect"
- Data scientist / ML Engineer
- Data Engineer (Feature engineer)
- Software Engineer
- DevOps
- ML Engineer / ML Ops engineer
- <Standard lifecycle diagram>
- Can have the monitoring system forward drift detection to the primary system
- Intersection of ML, SWE, DevOps, Data Engineering
- Challenges: organizational, ML changes, operational headaches
- Conclusion:
- In the real world, we observe data scientists still managing ML workflows manually to a great extent. The paradigm of Machine Learning Operations (MLOps) addresses these challenges.
- Follow ups
- Contrast this paper against existing solutions and different systems
- Point S., E. to this paper for interview questions